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We propose the use of a self-oscillating dynamical system -the pre-Galileian clock 
equation— for modeling the laryngeal tone. The parameters are shown to be the minimal 
control needed for generating the prosody of the human speech. Based on this model, we 
outline a peak delay detection algorithm for extracting the prosody of the real speech. 
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1. Introduction 

One of the distinctive features of the human brain is its aptitude for communicat- 
ing thoughts by speaking. In spite of its relevance for exchanging information and 
emotions, it is very hard to deal with the system made by the auditory-phonatory 
apparatus by means of physical models. The emotional component of verbal com- 
munication relies on the fundamental sound of speech -the laryngeal sound- which 
is generated by the vocal cords without any movement of mouth or tongue. 

The glottal sound has been studied for centuries in the framework of differ- 
ent disciplines, and numerous explanations for its unique capability of trasmitting 
emotions had been proposed. Among these ideas, one of the most captivating is 
Jean- Jacques Rousseau theory about the relation between prosody and music ™ In 
his Essai sur l'origine del languages, Chapter XII, he writes: 

"La colere arrache des oris menagans, que la langue et le palais articulent: mais la 
voix de la tendresse est plus douce, e'est la glotte qui la modifie, et cette voix devient 
un son; seulement les accens en sont plus frequens ou plus rares, les inflexions plus 
ou moins aigues, selon le sentiment qui s'y joint". 
Later, in the same book, he writes: 

"Qu'on fasse la meme question sur la melodie, la reponse vient d'elle-meme: elle est 
d'avance dans V esprit des lecteurs. La melodie, en imitant les inflexions de la voix, 
exprime les plaintes, les cris de doleur ou de joie, les menaces, les gemissement; 
tous les signes vocaux des passion sont de son ressort. Elle imite les accens des 



*This paper is dedicated to Francesco Guerra. 



1 



2, 2008 3:50 WSPC/Guidelines articolo 



2 Roberto D'Autilia 

langues, et les tours affectes dans chaque ididme a certains mouvemens de Fame: 
elle n'imite pas seulement, elle parle; et son langage inarticule, mais vif, ardent, 
passionne a cent fois plus d'energie que la parole meme". 

We want to study the relation suggested by Rousseau between musical melodies 
and corresponding prosodic patterns, using a simple model for glottal motion. The 
laryngeal sound is generated by the cyclic motion of opening and the closing of 
vocal cords. At the beginning of the cycle air is pushed by the diaphragm, the vocal 
cords are drawn together and air pressure increase, but when the pressure reaches 
a critical value it blows the vocal cords apart and flows between them. Then the 
vocal cords are then drawn together as a result of the Bernoulli effect El. 

To understand this oscillating behavior "without spring" , we study a simple self- 
oscillating model for the pre-Galileian clock ^ which produces a realistic laryngeal 
tone. The model can laso be used as a powerful tool for the analysis of glottal sounds. 
The results of the analysis can be compared with some adiastematic notation to 
suggest a formal correspondence between prosody and music. In this direction it is 
possible to suggest that the prosody is the drift of the musical gusto evolution, to 
answer to the main question about the nature of stochastic processes which produce 
"beautiful" or at least meaningful sequences of sounds El. 

The paper is organized as follows. In the next section we introduce a non- 
linear dynamical system which exhibit all the main features of the glottal cycle. In 
section 3 the parameters of the system are used as time dependent controls for the 
glottis, and in the following we present an algorithm for analyzing the control of 
the recorded sounds. The form of this control suggest also a delay-line-like behavior 
for the cochlear apparatus. 



2. The cycle of the glottis. 

The laryngeal tone is the oscillatory variation in air pressure generated by the 
cyclic movement of the vocal cords. At the beginning of each cycle ^ the vocal 
cords are held together by the action of the arytenoid cartilages. Air is forced into 
the trachea and when the pressure exceeds a threshold (the value of which depends 
on the strength of the vocal cords), it opens the vocal cords and flows through 
the glottis. Inside the constricted laryngeal passage air pressure falls (its velocity 
increase) giving rise, for the Bernoulli principle, to the pressure drop closing the 
vocal cords and completing the cycle. The cycle repeats at rates of 130-220 times 
per second. The ear perceives the variation in the cycle period as changes in the 
pitch. 

The valve-like behavior producing the laryngeal tone is characteristic of self- 
oscillating systems. A self-oscillating system is an apparatus which produces a pe- 
riodic process at the expense of a non-periodic source of energy. Self-oscillations do 
not depend on the initial condition but are determined by the properties of the sys- 
tem itself. Examples of self-oscillating systems include the electric bell, saw-tooth 
signal generators as well as wind and string musical instruments El. 
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Several realistic models for the glottal behavior have be en proposed over the 
years starting with the celebrated "two-mass model" We will now present 

a minimal mechanism thet exibits all the main features of glottal behavior. In 
particular we want to make explicit both the dependence of the oscillation period 
on the forces acting on the system, and the features of the trigger mechanism 
producing the self-oscillations. 

Let s G [—1,1] the variable related to the aperture of the glottis: in the extreme 
positions s = — 1 indicates that the glottis is completely closed and s = 1 that 
the vocal cords are open. We represent s in the [—1,1] interval in agreement with 
the usual representation of the acoustic signals. Assuming that the laringeal tone 
is proportional to the opening, s can be assimilated to the signal itself 

[TcTx] 9t-s(i) G [-1,1] (1) 

where [Tq,T{\ is a time interval. For simplicity we assume that the forces act on 
the glottis istantaneously: for s(t) > +sq the Bernoulli effect produce a force which 
closes the glottis, for s(t) < —so the pressure opens the glottis, and for — sq < 
s(t) < +sq no force is acting on the glottis. This approximation is useful for solving 
the model, but can be easily relaxed in computer simulations. When the glottis is 
opening, the force P(s) acting on it is negative and when it is closing P(s) > 0. 
Therefore over the interval — s < s < +s the force P = P(s) is a twovalued 
function of the variable s representing the opening of the glottis. Following the 
Andronov argument ^ P(s) imposes limitations on the shape of the phase plane 
trajectories, since assigning (s, s) does not uniquely determines the state of the 
system where — sq < s < sq . Instead we have to use a phase surface with two 
half-planes superimposed: (a) s < sq and (b) s > — sq. The points on this two- 
sheet phase surface have a one to one correspondence with the states of the system, 
the passage of the representative point from sheet (a) to the sheet (b) occurs for 
s = +sq, the reverse passage for s — — so, and the abscissa remains unvaried in 
both the cases. 

To further semplify the model we assume that the force P(s) applied to the 
glottis by the air pressure is constant in absolute value: P(s) — +Po for the closing 
and P(s) = — Pq for the opening. To model the vocal cords tension we introduce the 
constant resistence fo, which does not depend on the position of the glottis. On the 
basis of these simple assumptions it is possible to describe two different laringeal 
sounds. The first one, that we call prosody, does not have a natural period because 
there is no elastic force contributing to the movement of the vocal cords. This system 
does not exhibit high stability and is therefore a good model for cases in which the 
period of oscillation has to be sensitive to variations in control parameters. The 
second type has a natural period due to the clastic term, and in absence of feeding 
can perform damped oscillations. This model can be used to describe singing, but 
is out of the scope of this paper. 

The dynamic equation for the model without the elastic force is 

ms = f(s, s) + P(s) (2) 
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where m is the glottis mass, P = P{s) the force produced by air pressure on it and 
f(s, s) is the resistence of the vocal cords. If we assume f(s, s) = — fo sign(s) during 
the motion (s ^ 0), equation becomes 

ms = -f sign(s) ± P (3) 



Introducing the variables x = s/sq and z = +y / 'Po/msot, equation © can be 
rewritten as 

' ± = y u) 

y=-Fsign(y)-(-l) n 1 > 

where the dot operator is now the derivative with respect to the scaled time z, 
y ^ 0, F = fo/Po and n is even when the glottis is closing and odd when the glottis 
is opening. If the air pressure outside the glottis is a linear funcion of the glottis 
aperture, the x variable can be studied as the waveform of the laryngeal tone. 

If F > 1 and the glottis is at rest (y = 0) the force produced by the air pressure 
can not open the vocal cords, y — and the point (x,0) is the equilibrium state. 
Therefore, for the case when /o < Po the system has no states of equilibrium. From 
equation (@J we have for the (a) half plane (x < +1 and P = —Pq) 

dy l-Fsign(y) 

Tx = y (5) 

and integrating we have y 2 /2 — (1 + F)x = k for y < and y 2 /2 — (1 — F)x — k for 
y > 0, k = const, so that the phase path on the sheet (a) is made by two parabolae 
and the representative point moves to the left on the lower half of the sheet (y < 0) 
and to the right on the upper one. All the phase paths on sheet (a) reach their 
boundaries on the semiaxis x = +1, y > 0, and the phase paths on sheet (b) are 
symmetrical with respect to the origin of the coordinates. 

Following the Andronov treatment ^ we draw the two axes v where x = — 1 , 
y = —v < and v' where x = 1, y = v' > and consider the sequence of 
the points of intersections with them of an arbitrary phase path: v,vi,v%, V3, . . .. 
The representative point pass at the point (— 1,— v) from sheet (b) to sheet (a) 
and reach the axis of the abscissa at the point (—£,0), where £ is given by the 
equation v 2 — 2(1 + F)(^ — 1). Then the representative point moves on the upper 
half of the sheet (a) and reaches its boundary at where v\ is given by 

v'l = 2(1 — F)(£+l). The phase path on (a) establishes a one-to-one correspondence 
between the points of the axes v and v' , a point transformation where the sequence 
function is parametrized by the peak £ of the laryngeal tone. Then the representative 
point passes on the sheet (b) and reaches the semi-axis v on (— 1, — v%), where vi 
is determined by the same sequence (due to the symmetry of the sheets (a) and 
(b)) and the point transformation of v' in v is the same of v in v'. The fixed point 
v = v\ = v, corresponding to a symmetric limit cycle, is given by the equation 
(1 + - 1) = (1 - + 1) and is determined by the glottal peak 

C = i (6) 
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and 



= 2 



The Lamerey's ladder given by the two curves v 2 = v 2 (£) and 



v\ = tends to the fixed point if v 2 = 2(1 + F)(£— 1) has a steeper slope than 

v\ = 2(1 — .F)(£+l) and the unique periodic motion is reached from any initial 
conditions. The period of the oscillation can be easily computed by the evaluation 
of the transit time on the parabolae arcs 



T 
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Pq ^/F{l-F 2 ) 

and depends on both the air pressure and the vocal cord resistence. 



(7) 





Fig. 1. The plot of the signal as a function of time (on the left) and the corresponding phase path 
(on the right), where the two sheets are glued together and the vocal cords resistence is different 
for the opening and the closing. The value of s has been scaled to fit in the [—1, 1] interval. 



If we assume that the vocal cords resistence is not the same for the opening 
and closing glottis, the equation (0} gives the signal of figure ^ The corresponding 
sound is very similar to the sound obtained by placing a microphone on the throat 
and directly recording the glottal sound. 

3. The prosodic patterns of the controlled glottis 

The stability of equations can be evaluated by the ratios of the percentage 
variation of the period to the percentage variation of the two parameters, and 
shows to be very small »«. Because of this low stability the system is not suitable 
for building clocks. However its sensitivity to variation in the parameters and the 
fast convergence to the limit cycle, make it a suitable model for a controlled glottis. 

Let us suppose that the air pressure and the strength of the vocal cords are 
controlled by the speaker. Controlling Pit) and fit) it is possible to change the 
amplitude (JBJ and the period Q of the laryngeal tone to produce the prosodic 
patterns conveying the non semantic aspect of the speech. 

In order to decode the prosodic pattern we neet to know the time evolution of 
the control function 

c(t) = (f(mT(t)),p(mT(t))) (8) 
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In other wordss we suggest that when the cochlea decode the prosody of the speech 
it needs only to detect the value of £ and the time delay of £ with respect to 
the previous one. This stands in agreement with recent studies about the delay 
lines inside the cochlea Therefore the system can be controlled by a piecewise 
constant function which changes the period and the amplitude. If we choose a 



Fig. 2. A piecewise constant F control function, determining the prosodic pattern. 



control function (see figure |2J) we determine the time evolution of s(t) given by 
equation @. The resulting phase path and waveform are plotted in figure Note 
that in this case the phase paths intersect because now the system is controlled and 
non-autonomous. 




Fig. 3. The plot of the glottal sound controlled by the signal of figure [3] (left) and the corre- 
sponding phase path (right). 



We observe also that the characteristic time of the control change is larger than 
the period of the wave, in agreement with studies comparing of the laryngeal tone 
period with the muscular control time Therefore we state that the prosodic 
information is all contained in C and that in order to decode the prosodic pattern 
the listener needs to decode the C function. 
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4. Perception of the laryngeal tone 

The dynamical system given by equation J3j can be used to detect and represent 
the prosody of real speech. To accomplish this we only need to know the discrete 
function 

[To, Ti] 9 €[0,1] (9) 

Even though the sound of @ is different from the fundamental sound their 
prosody is identical. Moreover during the speech process the time interval between 
two consecutive £ is changed countinuously to produce the prosodic meaning. 

The £(t) function can be extracted from a recorded sound by means of the 
simple procedure of sliding a temporal fixed window on the signal, and selecting for 
each window the value of s(t) not exceeded backward and forward l^U The resulting 
signal is shown in figure 0] on the left. 

In order to obtain the control C function we have to know the T(t) value also. 
To do this we plot the delay of each peak £(rj) with respect to the previous one 
£( T i-i) as funtion of time (figure fright) where n, T2, . . . , tjv are the times where 
is non zero. 




Fig. 4. The glottal peaks for the wave in fig figure[3](left) and the £ delay (right) as function of 
the time. 

In this way it is possible to construct a map of the control C and therefore to 
decode the prosody of the speech. The amplitude and the delay of £ are the only 
information needed. The control function represented in figure |21 can be obtained 
by the data represented in figure 0] via the equations © and If we take the 
fundamental given by the Fast Fourier Transform of s(t) we obtain a rough map of 
C because the FFT needs some cycles to detect the period. 

When applied to the real speech, the "peak detection" yelds maps which re- 
semble the neumatic notation of the gregorian chant. This is not surprising given 
that during the middle age the rules of the correspondence between text and the 
music where very strict, and the compose r ha d to put the known melodies together 
rather then inventing anything new ones El Following the ideas of Rousseau, the 
algorithm proposed for the analysis and the synthesis of the laringeal tone can be a 
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tool for studying the correspondence between prosody and music by mea ns of the 
comparison of the contol maps with the adiastematic notation system H| 

5. Conclusion 

Although it is still not clear how the proposed system is related to the fluidodynam- 
ics of the glottis, the self-oscillating system used to model the laryngeal behavior 
is very simple and exhibits good agreement with the main features of the glottal 
sounds. It produces a realistic laryngeal sound an can be used to extract the prosody 
from a recorded sound. Its main features, shared with many "natural" oscillators 
is the lacking of the elastic term. The control of this system suggests also that the 
cochlear behavior is related to delay line detection. The equation is very simple 
to simulate in real time on a computer, and can be used for generating prosodic 
patterns or for manipulating the prosodic meaning of recorded sentencces. 
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